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Abstract 

We comment on a recent approach to spatial stochastic inversion, which centers on a con- 
cept known as "anchors" and conducts nonparametric estimation of the hkehhood of the anchors 
(along with other model parameters) with respect to data obtained from field processes. Concep- 
tual and technical observations are made regarding the development, interpretation, and use of 
this approach. We also point out that this approach has been superseded by new developments. 
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Zhang [2008c] introduced a new stochastic approach, named "anchored inversion" , to a type of 
spatial inverse problems. Conceptually, the approach is centered on a parameterization device that 
"transforms information" contained in "indirect data" to the form of "direct data" . Technically and 
computationally, it is centered on a nonparametric procedure for estimating multivariate likelihood. 
The approach was fully described and implemented in Zhang and Rubin [2008a], which also saw 
adoption of the name "anchor" for the parameterization device and "anchored inversion" for the 
methodology. 

This method was re-presented in Rubin et al. [2010] by massive copy-pasting and rephrasing 
without the knowledge, consent, and acknowledgement of Zhang (see http : / /www . Anchoredlnversion 
info). The re- presentation contained numerous technical errors and conceptual misinterpretations. 
In addition, key steps of the technical core, namely nonparametric estimation of likelihood, were 
omitted, rendering the description severely incomplete. 

In this note, we point out some of the errors in Rubin et al. [2010], clarify concepts in the 
method, fill in the missing technical component, and make additional comments on the method. 
It is hoped that this effort will help reduce misunderstandings about the method caused by the 
technically flawed presentation of Rubin et al. [2010]. 

In the end we point out that the method described in Zhang and Rubin [2008a] (or Rubin et al. 
[2010]) has been superseded by a radically different, and superior, method. 

1 Concepts of data classification and anchors 

The design philosophy of the method is explained in the sections 5 and 9 of Zhang and Rubin 
[2008a]; the technical details of the method are given in the sections 3, 4, 6 and the Appendix of 
Zhang and Rubin [2008a]. Also see Zhang and Rubin [2008b] (which uses the less preferred name 
"method of anchored distributions"). These materials are not to be repeated here. 

Following the notation of Zhang and Rubin [2008a] , Y refers to the spatial attribute of interest 
(such as hydraulic conductivity), x indicates spatial location. Geostatistical (or "structural") pa- 
rameters are denoted by 9; anchors are denoted by i!}. The forward process (or model) is denoted 
by M. 
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Zhang and Rubin [2008a] classifies on-site data into two types. Type A data {za) are direct 
values of Y at individual locations. Type B data (zf,) are functions of Y at multiple locations 
(usually the entire domain). Type A data are so defined that they can be used directly in creating 
conditional fields (see (A-3) in Zhang and Rubin [2008a], or (13) in Rubin et al. [2010]). As 
for type B data, it is important to note that they are functions of the Y field. One can not 
establish correspondence between a type B data component and the Y values (or anchors) at 
specific locations. Moreover, type B data may not be "spatial" or possess measurement "locations" 
in the usual sense. For example, the data may be a time series of tracer concentrations in a transport 
experiment, measured at one or more locations; or they may even be a summary of such a time 
series. Measurement locations, if any, of type B data are an internal business of the forward model 
Ai. They are not a concern to the inversion procedure. What the inversion procedure expects is 
a prediction vector M{y), produced by the forward model for any field realization y, that can be 
compared with the observed data vector, zt,. 

In the definition of type A data, Rubin et al. [2010] calls y a "known function" (below for- 
mula (1)). This is inaccurate. Here, the data must be the spatial attribute Y itself (at locations 
Xa) and not any transform thereof, because otherwise the use of the data Za in all subsequent 
formulas needs to be modified. However, the data may have been obtained through transforma- 
tions. For example, one may arrive at an estimate of the hydraulic conductivity at location x using 
empirical relations that connect conductivity with core-support geophysical properties obtained at 
that location. In this case, the type A data Za are the estimated conductivity, rather than the 
geophysical property (which is an empirical function of conductivity). The transformation may 
well be nonlinear. 

At times Rubin et al. [2010] is confused about the concept of type B data and talks about 
anchors "corresponding to" specific type B data components; see, e.g., paragraphs [18], [58], [82]. 
A further inaccuracy appears in paragraph [44] of Rubin et al. [2010], which states that "M(y) 
could be used to generate the field or possibly a number of fields ^b" . This statement is misleading 
in that Zh (value of the forward model) is not a "field" . The use of the symbol zj, here (likewise 
in paragraph [36], and in the "Notation" with entry "z;,") is confusing, noticing that it has been 
declared in paragraph [15] that "the tilde sign over a variable denotes a field of that variable". It 
is noted that the type B data (i.e. Zf,) need not be anything like a "field" in the meaning that is 
familiar to the audience of that article. For example, the vector Zb may be the combination of a 
few head measurements and a time series dataset of tracer concentration. Besides, M{y), given 
a y, can generate only one Zb, not "a number of" Zf,. The reason is that the forward function is 
deterministic with y. 

Overall, the role of anchors is to connect data (both types A and B) with the unknown field Y. 
Rubin et al. [2010, paragraph [23]] mistakenly mentions the role of the anchors as a "mechanism 
for connecting between type A and type B data" . 

A conceptual justification for the use of anchor is given in Zhang [2008c, sec. 3.3]. The idea 
is that (after appropriate choice of anchors and adequate inference of their posterior distribution) 
anchors "capture" the information contained in data Zb about the field Y, so that simulation of Y 
conditional on Zb can be substituted by simulation conditional on the anchors. Zhang and Rubin 
[2008a, first paragraph of sec. 5] explains this idea as follows. 

A common goal in inverse modeling is to create simulations of the field Y that are 
"conditioned" on available data... When the available data include type-B data, a 
common practice... generates conditional simulations... by sampling the conditional dis- 
tribution of the field, p(y \ ^, ■!?)... There are conceptual difficulties in this treatment, to 
be discussed in section 8.4. A natural reason for this treatment is the absence of an 
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intermediate device. This device would guarantee that fields simulated with the help 
of this device are already conditioned on the type-B data, therefore there is no need to 
verify (or reject) the conditioning by evaluating the function M on the created fields. 

This explanation is re-used in Rubin et al. [2010, paragraph [16]]. More details along this line are 
given in Zhang and Rubin [2008a, sees. 5, 6.3], which are reused in Rubin et al. [2010, sees. 2.2, 5]. 
However, there are issues with this justification. 

2 Numerical estimation of likelihood 

The computational core of the method is numerically estimating the likelihood The 
central position of this likelihood term in the entire methodology is evident in formula (6) and 
Figure 1 of Zhang and Rubin [2008a] (or formula (3) and Figure 1 of Rubin et al. [2010]). That this 
likelihood is nonparametrically, numerically estimated is a defining feature of anchored inversion. 
The other components such as prior specification, simulation, and prediction are not unique to this 
method. 

The importance of the numerical estimation of this likelihood is also shown in Rubin et al. 
[2010] at multiple places, such as in paragraphs [23], [33]-[37], [73], [121]. However, Rubin et al. 
[2010] does not provide technical details of this component. Even the dedicated section 3.2 only 
contains a sketchy verbal description in paragraph [36], which is borrowed from Zhang and Rubin 
[2008a, paragraph 4 of sec. 6.2]. (The same description also appears in Murakami et al. [2010, 
paragraph 5 of sec. 3.1.1] and Murakami [2010, sec. 2.3.4]). This unfortunate omission creates an 
awkward situation: the article by Rubin et al. [2010] presents a methodology without presenting 
how it works. 

The numerical procedure of likelihood estimation adopted in Zhang [2008c, sec. 7.2] and Zhang 
and Rubin [2008a, sec. 6.2] is the k-th nearest neighbor estimator. Let zi,..., Zn be an i.i.d. sample 
from the d-variate (continuous) distribution of Z, whose density function is denoted by /. The 
density at zi, (i.e. the observed type B data vector) is 



volume of the unit hypersphere of dimension d. Denote by r the distance between Zfj and its k-th 
nearest neighbor in the sample {^i}" • Then an estimate of the density is 

= — (2) 

As n — )• oo and r — )• 0, this estimate approaches the true density. Loftsgaarden and Quesenberry 
[1965] suggests y/n as an empirical choice of k. 

There is a large statistical literature on multivariate density estimation. ("Likelihood" is a 
probability density.) It is not difficult to adjust specifics of this procedure, but the spirit of non- 
parametric, numerical estimation remains. 

Rubin et al. [2010, paragraphs [37], [109]] points out, using the observations in [Zhang and Ru- 
bin, 2008a, list after paragraph 4 in sec. 6.2], that only the likelihood at the observed Zf, value needs 
to be estimated; moreover, the estimation may use a large sample size n as long as computational 
resources permit. Rubin et al. [2010, paragraph [37]] also cites the relatively low dimensionality of 
{0, -d) as a factor that contributes to the method's computational efficiency compared with "full-grid 
inversion". However, anchored inversion and "full-grid inversion" use their respective parameter 




(1) 



where /(•) is the identity function. 



is the Euclidean distance, and Vd = vr' 



■'^/Vr(l + d/2) is the 
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vectors, {6, i?) versus y, in different ways; their relative computational cost is unrelated to the 
comparison of their parameter dimensionalities. 

As mentioned above, it is a signature of the method that no parametric form is assumed for the 
likelihood function p{zb | rather, the likelihood is estimated nonparametrically. This decision 
is necessitated by the largely intractable model errors; see Zhang and Rubin [2008a, sec. 8.4] for 
an explanation. Rubin et al. [2010, paragraphs [23], [34-35]] states that the likelihood ^(zft | 6*, i?) 
may be taken to be either parametric or nonparametric. The application paper by Murakami et al. 
[2010] actually makes an attempt to assume Gaussian likelihoods. However, it is important to note 
that the distribution of the forward process output is dependent on (1) the nature of the particular 
process; (2) the field parameter (6 and ■!?). The distribution (both shape and parameters) may 
change with any change of either of these two factors. Besides, it is a misconception that parametric 
methods are more efficient than nonparametric ones. This is the case only when the parametric 
form is correct. 

On the other hand, if one does assume a parametric form for the likelihood function (with 
known parameter values, whether or not the assumptions are reasonable), then the problem falls 
squarely in the realm of Markov chain Monte Carlo (MCMC) and related sampling schemes. In 
that case there is no advantage in using the newly proposed method. 

3 Specification of prior 

The anchored inversion methodology is open and flexible with respect to the geostatistical formu- 
lation and the prior specification for the model parameters, 9 and "d. Nevertheless, these technical 
details are important because (1) one has to make choices for these technicalities in order to get 
the procedure running, and (2) the details affect the performance. Of these technicalities, the 
specification of prior is the most important. 

Zhang [2008c] and Zhang and Rubin [2008a] adopt a geostatistical formulation that includes 
parameters for trend, variance, scale, nugget, and the Matern correlation. In addition, Box-Cox 
transform is performed to deal with possible nonnormality of Y. The prior for the geostatistical 
parameters is given by formula (A-2) in Zhang and Rubin [2008a]. This prior has been proposed 
by Pericchi [1981] to work with the Box-Cox transform. This prior is re-used by Rubin et al. [2010, 
formula (11)]. However, Rubin et al. [2010] choses to drop the Box-Cox transform. As a result, the 
reason for using this particular form of prior is lost. The same problem exists in Murakami [2010, 
sec. 2.5.1]. 

4 The Matern covariance model 

The Matern covariance (or correlation) model is an important tool in spatial statistics. Zhang 
[2008c, sec. 3.2] and Zhang and Rubin [2008a, Appendix] include brief introductions to this tool. 
An especially valuable asset of the Matern model is its parameter k, which concerns the smoothness 
of the random field bearing this spatial association modeled by the Matern covariance function. As 
K increases, the field becomes smoother in a certain statistical sense. One can choose a particular 
value of K based on this interpretation, as is done in Zhang and Rubin [2008a]. Examining the 
full range of k is usually not a good idea: it is both unnecessary and infeasible. It is infeasible 
because it necessitates numerical evaluation of the Bessel function (admittedly this is not a huge 
obstacle in this age). It is unnecessary because one or a few reasonable smoothness levels (k 
values) usually provide adequate characterization of ones random field under consideration (as 
far as smoothness is concerned). A deeper reason for skipping a "full-range" examination lies in 
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the intimate interactions between the smoothness, scale (or range), and variance parameters; the 
interactions cause difficulties in identifiability. 

What is debatable in the use of the Matern model in Nowak et al. [2010] and Rubin et al. [2010] 
is that both articles advocate the notion that the full range of (0, cxd) be explored for k, and this 
act be interpreted as "model averaging". The authors emphasize the notion that different values 
of K can be viewed as different covariance models. 

If one limits n to several discrete values, then the posterior of p{k,) places certain weight on 
each of these values, hence the estimated correlation structure can be viewed as a weighted average 
of these special cases. If one picks a single estimate, it amounts to a selection from these several 
special cases. This model averaging or selection is accomplished via parameter estimation (of k), 
as opposed to some external measure that distinguishes the performance of those special cases 
considered. Such model averaging interpretation is less useful when k is treated as a continuous 
variable, because the continuously changing k represents a coherent concept of "smoothness" , and 
thinking of each arbitrary value of k as a special "model" in its own right is not beneficial. Another 
situation where the "model averaging/selection" perspective is less useful is when the correlation 
function is not the sole focus of the discussion but rather constitutes part of a large parameter 
vector. 

In the effort to introduce the Matern model and promote the "model selection/averaging" 
interpretation, Rubin et al. [2010, sec. 4] makes some statements that harm the coherency of 
the description of the inversion method. In paragraph [47], for example, the article confuses the 
"forward model" with the "covariance model". It refers to any particular value of the parameter 
vector (0,1?) as a particular (forward) "model", and talks about the "probability" of each such 
model, while {9, "i?) in MAD is the model parameter and as such certainly can take different values 
according to its posterior distribution. In paragraph [48] it is claimed that "equation (3)... could 
be written for each of the N combinations of 6i, 'di instead of a single 0, combination". However, 
equation (3) is about parameters 9 and i9, which can take various values (or "combinations") in the 
ffist place. The equation is never about "a single 9, 'd combination". For a single 9, •& combination, 
one needs only to plug the specific values into the equation. 

5 Choice of anchor locations 

Since anchors are parameters defined by the user, the user needs to make decisions on how many 
anchors to use and where to place them. 

Zhang [2008c, sec. 7.1] outlines factors that should be considered while determining the place- 
ment of anchors. Further details are given in Zhang and Rubin [2008a, sec. 6.3 and paragraphs 6-7 
of sec. 9]. To summarize the main points, (1) anchors should be placed where they are sensitive 
to the forward process M so that information in the forward data is efficiently "transferred" to 
anchors; (2) anchors should be placed where one's future prediction tasks need them most. The 
second point is referred to as "objective oriented inversion" in Zhang and Rubin [2008a, paragraph 

6 of sec. 9] . Of these two considerations, the second is more important because one's goal is never to 
"re-play" the data-generating forward process whose outcome is already known. When future 
prediction task is not yet known, one should aim for the anchors to achieve overall "good" and 
balanced representation of the entire field y, with attention to regions possessing critical features. 

These considerations also appear in Rubin et al. [2010, sec. 5] (in paragraphs [58] and [62] 
the authors re-term the second criterion "targeted anchor placement"), Murakami et al. [2010, 
sec. 3.1.4], and Murakami [2010, sec. 3.4.4]. 

Regarding the "right" or "optimum" number of anchors to use, Zhang [personal correspondence 
to Murakami and Rubin, 4 October 2008] suggests, "The number of anchors should be determined 
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based on a measure compromising the 'sharpness' of conditioning and the identifiabihty of the 
parameters (anchor values). With a small number of anchors, the conditioning can't be very 
sharp, because the variability in the field conditioning [conditional] on the anchors is still large." 
Zhang [personal correspondence to Rubin, 21 December 2008] further clarifies, "Basically in the 
area of interest, we can distribute inverted anchors according to some recommended sampling 
design. Then we can increase the number of inverted anchors and get prediction results. The right 
number of anchors is achieved when the prediction stabilizes." These suggestions are rephrased in 
Rubin et al. [2010, paragraphs [59-60]] and Murakami [2010, sec. 2.3.5] without implementation. 
Implementation of this idea is likely to be nontrivial. 



6 Sequential assimilation of multiple datasets 

Zhang [2008c, sec. 9] mentions the possibility of incorporating multiple type B datasets in a se- 
quential manner, and outlines the conditions under which this is or is not suitable. This idea also 
appears in Rubin et al. [2010, sec. 7]. The formula (16) in Rubin et al. [2010] is correct up to the 
second line. The third "=" does not hold because 



/ (2) 



One would have equality here if the two datasets z^^^ and zj^^^ are independent, which usually re- 
quires that the spatial domains in which their respective forward processes take place are disjoint 
(and far apart). In practical applications, however, this is unlikely to be the case. Note, in partic- 
ular, that the dependence between z^^^ and z^^ is not diminished even if the two forward processes 
are of incomparable physical natures, for example one is a pumping test and the other is a tracer 
test. The dependence is due to the common Y field that underlies both tests. Another situation 
in which one would have equality in the relation above is that z^^'^ is implied by (^, -iJ^^^) , 

that is, any field y conditional on any particular {9,'&a,'&^h\'&^h^) would reproduce (i.e. predict by 
the forward model) z^^^ exactly. This can not happen in the current context. 

Formula (17) in Rubin et al. [2010] assumes z^^^ and z^^ are independent. As mentioned above, 
this assumption could be valid when the respective forward processes for z^^^ and zj^^ take place 
in disjoint and far-apart regions. The further condition of "far-apart" is needed because of spatial 
correlation in the Y field. When these conditions are met, z^^^ and zj^^ would be disjointly useful 
for their respective spatial regions. In such situations, it may not be appropriate to assume the 
different regions share the same value of the structural parameter 9. Consequently, one may as well 
need to divide the entire spatial domain into isolated sub-regions and study them separately. This 
is not a typical scenario of experimental or observational studies. 

Murakami [2010, sec. 4.2] makes similar attempts and has similar problems. More discussion 
on this issue can be found in Zhang [2008c, sec. 9]. 



7 Comparison with the pilot point method 

Very early development of the anchor idea benefited from the popular "Pilot Point" method (PPM) 
[de Marsily et al., 1984]. This is shown in that Zhang [2008c] used the name "pilot point" pro- 
visionally in describing the new concept, before the name "anchor" occurred to the author. (The 
name "pivot point" occurred before "anchor" but was abandoned; see Zhang [2008a].) Some super- 
ficial similarities notwithstanding, the two methods are very different; see Zhang [2008a] for some 
discussion. 
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As is explained in Rubin et al. [2010, sec. 8.3], an important difference between PPM and 
ancliored inversion is tliat tfie former produces an ensemble of pilot point values that are not a 
random sample of a clearly-defined target distribution, whereas the latter infers the distribution 
of anchor values (along with other parameters) following statistical rules. These observations are 
borrowed from Zhang [2008a], Zhang [2008d], Zhang [2008b], and Zhang [personal correspondence 
to Rubin, 8 May 2008]. (See, e.g., paragraph [104] of Rubin et al. [2010] and the 2nd paragraph of 
sec. 1 in Zhang [2008b].) 

Another issue with PPM is that it requires the geostatistical parameters {9) to be pre-specified, 
and has no provisions for systematically updating these parameters in light of the type B data Zb 
(Zhang [2008a, page 5] and Zhang [2009, sec.6.1]). 

In PPM, the same optimization procedure is applied to each (vector) value ever tried for the 
pilot points, and each value for the pilot points eventually leads to one realization of the field Y. 
As a result, the computational cost is proportional to the number of field realizations created in the 
resultant ensemble. In anchored inversion, inference of the parameter (both 9 and '&) distribution 
requires a large number of field realizations. Once this inference is done, creating field realizations 
(to be used for characterization, analysis, prediction, etc.) is not part of the inversion procedure, 
and the computational expense is essentially negligible. This aspect is discussed in Zhang [2009, 
sees. 6.1, 7.1] 

8 Additional issues 

This section lists some technical inaccuracies in Rubin et al. [2010]. 

In paragraph [26] the authors state that they "could use p{9 \ 'da) on p{'&a \ 9)p{9) to accommodate 
the prior" of 9. This would be correct if and only if p(i?a) is constant, which is unlikely the case in 
this context. 

In paragraph [32], the authors state that "the prior for the entire model appears indirectly in 
equation (3) through the relation p{9 \ 'da) oc p{9)p{'da \ 9)." First, as mentioned above, this relation 
does not hold unless p(i?a) is constant. Second, the model parameter vector contains 9 and t?, hence 
a prior for the "entire" model is a function of both 9 and -d. In fact, the prior for the model is 
provided by p{9, '& \ Za) = p{'&a \ Za)p{9 \ "da-, Za)p{'&b I 6, '&a, Za)- (It can be simplified if Za is error- 
free.) This prior specification makes use of part of the data (z^) and leaves the other part {z^) as 
the "main" data. See Zhang and Rubin [2008a, paragraphs 1-2 of sec. 6.1] or Rubin et al. [2010, 
paragraphs [21-22]]. 

In paragraph [37] appears this statement: "When multiple data types are involved, that would 
include conditional simulation of the Y field followed by forward modeling." This could confuse the 
reader. "Conditional simulation" and "forward modeling" are the basic components of the method; 
they need to be run whether or not "multiple" data "types" are involved. 

Paragraph [44] cites several reasons for the discrepancy between the prediction M(y) and the 
observation Zf,. The cited reasons are not the most important ones. The most important reason 
for this discrepancy is that the field Y conditional on fixed 9 and i9 is random. In fact, the method 
allows all four types of the cited "errors' to be absent, and in that case M{y) would still deviate 
from Zfe. 

The confusing notation "parametric error" appears in paragraph [44] and the confusion is rein- 
forced in paragraph [45] by the statement "The parameter error is represented by e;,, and it covers 
errors in and in the structural parameters 9" . One knows from formula (2) that ej, indicates error 
in the type B data. This does not include "errors" in i?^ and 9. The same misstatement appears in 
the "Notation" . Recall that 'd^ and 9 are model parameters in a Bayesian approach, implying that 
the notion of their "errors" is undefined. 
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In paragraphs [68]-[71], the authors replace the more conventional notation (3 (for linear coeffi- 
cients) in Zhang and Rubin [2008a, Appendix] by m and call it "design matrix" , causing profound 
confusion. "Design matrix" in statistics involves x (spatial coordinates) only; it is not an unknown 
parameter subject to estimation. A better notation would be /3, which multiplies the "design ma- 
trix" {X in Zhang and Rubin [2008a, Appendix]) to get the mean of the spatial variable. When 
the field has a constant mean, the design matrix would be a column matrix of I's and /3 would be 
of length 1 . This is not like what is stated in paragraph [69] , "all the terms in m are equal to /u" . 
The m in formula (12) is meant to be the mean of Y at locations Xa- This mean is a function of 
the structural parameters (in particular, the trend parameters) as well as the location vector Xa- 
It is a vector of length n^, and is not the model parameter m in the previous paragraphs. (The 
length of the model parameter vector is independent of na-) However, its functional relation with 
the structural parameters is lost in (12). A similar confusion occurs in formula (14). 

Following the choice of symbols in "Notation", all the ai^ in section 6.1.2 should be x^^, that 
is, it is the locations of the anchors i?;,. This is not a;?,, which is the locations of type B data zi, (as 
mentioned above, however, the "location of type B data" is not always meaningful in this method). 
The Xa should be x^^, namely the locations of the anchors '&a] but these locations coincide with 

In formula (13), the Za between the pair of '||'s should be "Qh- The formula as is written is not 
a function of 

In formula (14), the z^, should be Za- The former is not available in this context and is not 
related to the quantities the way stated in (14). The m is a function of 9 and x^^^ giving the mean 
of Y at locations a;^,^. The confusion about m has been mentioned earlier. 

In formula (14), the Rx^^^x^ should be Rx^^,xa- This is the correlation between and "Qa, not 
the "conditional" correlation of t?;, given which is the Rx^^^^a ™- (1"^) ^"^^ (l^)- 

The Rxf^\x^ given in (15) is the conditional "correlation" of "i^fe, not conditional "covariance" as 
claimed before formula (14). In Zhang and Rubin [2008a, Appendix], the counterpart to (15) is 
written for the conditional covariance, a'^R. 

The formulas (11)-(15) are taken from the Appendix of Zhang and Rubin [2008a] , with necessary 
adjustment because the Box-Cox transform used in Zhang and Rubin [2008a] is dropped in Rubin 
et al. [2010]. The two papers also differ slightly in choice of symbols. It is not possible to give 
correct versions of (12), (14), and (15) without introducing symbols that are not defined in sections 
6.1.1 and 6.1.2, because the concept of "design matrix" and the symbol m defined in paragraph [68] 
are problematic. 

The right-hand side of formula (18) equals p{za, ziy\6), which is the hkehhood. Optimizing the 
likelihood would lead to ML rather than MAP estimates. Formula (18) would be correct if the 
prior p{9) were flat. However, that would make the formulation non-Bayesian, and hence unrelated 
to MAP. In paragraph [90], p{za \ 6) is referred to as "prior". However, Za are data, which do not 
have "prior". In paragraph [91] appears the relation eb = Zh — M.{y). However, y does not appear 
in (18), hence eb is undefined. 

9 Conclusion 

The method of anchored inversion has a number of notable features including (1) the anchor concept, 
which on the one hand creates a bridge between the spatial attribute field Y and the forward data 
Zb, and on the other hand separates the dimensionality of the model parameter (6 and 'd) from that 
of the numerical grid; and (2) the recognition that the likelihood function is unknown and needs to 
be estimated nonparameterically. The method is general and in a sense modular, although these 
desirable features are not unique to this method. 
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The anchor parameterization may be used in a Bayesian way (e.g. deriving posterior distribu- 
tions) or a non-Bayesian way (e.g. obtaining maximum hkehhood estimates). Zhang and Rubin 
[2008a] presented a Bayesian procedure for using anchors. 

Additional comments on this method can be found in Zhang [2010]. 

To conclude with a little history, the method ("anchored inversion") as introduced in Zhang 
[2008c] was fully described and implemented in Zhang and Rubin [2008a]. The same method was 
re-presented (by copy-pasting and rephrasing) in Rubin et al. [2010] without the knowledge and 
consent of Zhang, the actual author, but with technical errors and misinterpretations. The method 
was fundamentally changed in Zhang [2009], which has turned out to be an intermediate milestone, 
as the method(s) described in these works has now been superseded by a new, and superior, method 
described in Zhang [2011]. 

10 Postscript 

The scandalous practice of Rubin et al. [2010] has been followed by Murakami [2010], Murakami 
et al. [2010], Chen et al. [2012], and Yang et al. [2012]. 

With the unfortunate prospect of propagating misinterpretation and misattribution, Rubin et al. 
[2010] and its follow-ups have been referenced in a number of publications, including DafSon et al. 
[2011]; de Barros et al. [2012]; Fernandez-Garcia et al. [2012]; Kowalsky et al. [2012]; Leube et al. 
[2012]; Lu et al. [2012]; Nowak et al. [2012]; Williams and Maxweh [2011]; Zhou et al. [2012]. 
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